Komunitas
feddit.de
Mhm I disagree with your second point. Since you can’t use any styling on Gemini objects, you won’t get table layout as we had in dark ages of Html. With tables like in Markdown you can just lay out tabular data in an actual table. Mhm I guess with the plaintext environment we still can link to external resources like images and other multimedia or interactives.
Komunitas
sh.itjust.works
A mystery for the ages, why does Microsoft buy so many gaming companies only to not use their IPs. If I owned Microsoft, I’d go fucking crazy “We’re putting Sheogorath, Crash Bandicoot, Master Chief, Joanna Dark, and Corvus from Heretic into Heroes of the Storm. New Spyro, New Banjo Kazooie, New Jet Force Gemini, fuck it we own Activision right? When’s the last time there was a Pitfall Harry game? Right now mother fucker! Wash it down with a bottle of Hexen 3! Maybe a new Battletoads that doesn’t suck? Sorry about that reboot, that was a little cringe. But ya know what’s not cringe? A Crash and Spyro crossover game that’s actually a 3D Platformer and isn’t a Skylanders game… But those of you who like Skylanders? Skylanders HD Remasters; toys no longer required! Bowser and DK in the Switch versions” Or at the very least put the ID Software games Activision owns onto Steam since Activision and ID are owned by the same comapny now (Heretic 2 and Wolfenstein (2009) are in danger of becoming lost media) Like I never got that, there are ID Software (which is now part of Bethesda) games that aren’t available because Activision owns them, but Bethesda and Activision are owned by the same guy now.
Komunitas
lemmy.dbzer0.com
Every minute that passes, Gemini (not the Google one!) looks more viable, which is already a shame because as I described in lemm.ee before it went down, that itself feels like “Gopher but in the format of a brutalist buttplug”. What we need is some sort of return to HTML + CSS 1.0, or a web engine that simply ditches JS, so that development can be tackled by Individuals again.
Komunitas
artemis.camp
Gemini exists and the protocol is designed to be hard to extend deliberately, but it serves documents as a markdown-like format instead of modern html/css/js
Komunitas
lemmy.world
The Gemini podcast is going to condense your text and make it conversational, but it will necessarily lose detail in the process. A better recommendation is the Eleven Labs Reader, it’ll just read any text or file you throw at it with top tier voice models. Can use it for free and they have paid plans for more use. They also have a “podcast” generator option like Gemini, but I haven’t tried it so can’t vouch for the quality. I use Eleven Labs all the time for things I want to read, like email newsletters, industry publications, etc but never find the time to sit down and read. Now I can have AI read them to me while I walk the dog. Super handy imo
Komunitas
lemmygrad.ml
Yes in the first case, and mostly yes in the second. Wall of text incoming lol, I’ll put it in spoiler tags. ::: spoiler Audio there are both specialized models for transcription, and general models for transcription+translation+other things. Gemini is able to do both since it’s a huge model, I sent it a quick mp3 I found online of a podcast segment and it sent me the translated transcription in one shot. It does it on the fly since it’s a multimodal model but you can also have transcription models (many of which run locally, they don’t need to be as big as like Deepseek or Gemini) that will simply transcribe the audio. These work in almost any language too and can even separate speakers. Then with the transcription in hand you can have it translated anywhere. Tbh I’m not even sure if there’s still a space for specialized transcription (speech-to-text) models when we have the models we have now. I guess if you want to try and run them locally so you don’t send your data to Google. But yeah they’re nowhere near close to what the huge models can do - for example a local transcription model might not be able to handle different languages within the same file, and it won’t do translation itself of course, you’ll need an LLM for that. There’s a lot new stuff coming out and Gemini Pro (have to put it on pro if you use it) is pretty impressive. Gemini is able to understand the nuances and abstract elements of a painting it’s never seen before (AI-generated for example), and it even understood the speakers’ tone in the file I sent it, which was in French - it was a podcast for French learners and Gemini correctly, unprompted, pointed out the “hesitation in Sebastien’s voice when he admits he doesn’t speak fluent Japanese”. So yeah this is where we’re at today lol. It picks up on these nuances perfectly well. ::: ::: spoiler Languages handled This is more difficult. The big bottleneck from what I understand is high-resource and low-resource language, in two ways. Traditionally HRL and LRL simply means whether there’s a lot of media available in a given language or not. Like languages that were originally oral and not traditionally written down would be considered low-resource for example. With LLMs this typically means stuff you find on the internet. So some languages might be widely spoken but not have a “lot” of availability online because they don’t use the internet as much, or it’s locked down, or it’s just a drop in the ocean compared to all the English and Mandarin content that gets published on the web every single second. A language like Persian for example is low-resource - they have a huge internet space by themselves but it’s locked out so it’s hard to get enough data to train LLMs on Persian. The second thing is when training LLMs you have to make a choice what languages you feed it, because “space” remains limited when training the LLM. So a language that makes up only 1% of the training set remains low-resource for that set. All that to say, I’ve had very varying results when trying out text translations. Romance+Germanic languages are assured, Mandarin is now standard because there are so many Chinese LLMs (sometimes during training they start thinking in chinese btw lol) but outside of that it’s kind of a crapshoot. Japanese-to-English worked for example, but was a bit too “word-for-word” and robotic, but it worked. English-to-Persian still carried some mistakes and a very “machine” way of writing/speaking. I’m sure there’s a solution for it and eventually this hurdle will be overcome but at this time I would not readily trust LLM translations that are outside of that sphere of languages. When doing translations I try different models with the same prompt and pass the output on to a native speaker who can tell me how they see it and what we could try to improve, but I haven’t had much success just changing the prompt tbh. I think it’s a current limitation of models. :::
Komunitas
midwest.social
“A beautiful size.” I like that. It describes the phenomenon exactly. Like, Gemini (the protocol) is too small. It’s below the threshold is usefulness. I still cross publish (serve) my content in Gemini format, but I never actually browse it anymore. (Tiny rant) That said, I think Gemini went too far; gmi is smaller than it needs to be, and it was locked down and made immutable prematurely. Client/server interactions need to be a little more complex; I think the extreme simplicity has contributed to its lack of adoption: it’s almost impossible to serve a functional and user-friendly search engine with, leading to what’s largely a dark network - not intentional dark, but filled with isolated, unreachable nodes and practically no discovery. And with its failure, it shut the door on that solution space. It was popular enough that early adapters who tried it are going to shy away from something Gemini-like, but a little more full-featured. I’m bitter about Gemini; if only Solderpunk had had a little more vision.
Komunitas
lemmygrad.ml
Lack of official client - disagree. Is there an official web client? Well, google probably think there is. Microsoft also think there is. Apple is certain there is. I do dislike the links on a separate line. I suppose if I were to invent Gemini I would have been tempted on markdown. I can see the problems with it, but it would allow some very simple styling. And as for the privacy, it feels like nobody sees this in the same way as me. Laws to prevent tracking and surveillance can never work. It’s in a capitalist country’s best interest to allow it, and even if I could live somewhere that had legal protections that I thought were reasonable, the internet is global.
Komunitas
lemmy.sdf.org
The problem is that no browser can allow you to escape the horror that is web standards & practices that have been developed over decades […] practically the entire web is reliant on JavaScript, […] I’ve been saying it for a while: continuing to play catch is a losing move for Mozilla or for any independent browser maker. The real move, is to switch to or at least integrate an alternate internet, something that uses a protocol that is simpler and more limited by design - just get rid of Javascript (or of “remote execution”, really) and you instantly get a much leaner, much securer internet design. I’ve heard pretty good things about the Gemini protocol, but IMHO they went too far too extremist into the “text internet” philosophy, and as a result is a raw downgrade from Gopher. Gopher could actually be a good option.
Komunitas
rss.ponder.cat
Google has a new AI tool that can help iPhone users to better grasp complicated or confusing writing online. The Simplify feature, rolling out in the Google app on iOS starting today, generates a simpler, more digestible version of any highlighted text without leaving the current web page. Simplify is built on Google’s Gemini AI model, and was developed by Google Research to make technical jargon easier for anyone to understand without losing key details. For example, Simplify can break down medical terms like “emphysema” (a condition that damages the air sacs in lungs) and “fibrosis” (dense connective tissue or scarring that develops in response to damage) in reports and journals, preventing readers from needing to reference terminology on a separate web page. Google says that people found the simplified versions to be “significantly more helpful than the original complex text” in its testing, but acknowledged the study “has limitations” and that “ongoing vigilance” is required to monitor errors. The feature can be found in the iOS Google app by highlighting any text on a website, and tapping the Simplify icon from the menu options that appear. When asked if Simplify will be made available for Android and desktop Chrome users, Google spokesperson Jennifer Kutz told The Verge that “we don’t have anything to announce yet, but we’re always looking to bring useful features to more of our products.” From The Verge via this RSS feed
Komunitas
rss.ponder.cat
On Thursday, Anthropic announced significant upgrades to its AI assistant Claude, extending its research capabilities to run for up to 45 minutes before delivering comprehensive reports. The company also expanded its integration options, allowing Claude to connect with popular third-party services. Much like Google’s Deep Research (which debuted on December 11) and ChatGPT’s deep research features (February 2), Anthropic first announced its own “Research” feature on April 15. Each can autonomously browse the web and other online sources to compile research reports in document format, and open source clones of the technique have debuted as well. Now, Anthropic is taking its Research feature a step further. The upgraded mode enables Claude to conduct “deeper” investigations across “hundreds of internal and external sources,” Anthropic says. When users toggle the Research button, Claude breaks down complex requests into smaller components, examines each one, and compiles a report with citations linking to original sources. Read full article Comments From Ars Technica - All content via this RSS feed
Komunitas
rss.ponder.cat
After several years of escalating AI hysteria, we are all familiar with Google’s desire to put Gemini in every one of its products. That can be annoying, but NotebookLM is not—this one actually works. NotebookLM, which helps you parse documents, videos, and more using Google’s advanced AI models, has been available on the web since 2023, but Google recently confirmed it would finally get an Android app. You can get a look at the app now, but it’s not yet available to install. Until now, NotebookLM was only a website. You can visit it on your phone, but the interface is clunky compared to the desktop version. The arrival of the mobile app will change that. Google said it plans to release the app at Google I/O in late May, but the listing is live in the Play Store early. You can pre-register to be notified when the download is live, but you’ll have to tide yourself over with the screenshots for the time being. NotebookLM relies on the same underlying technology as Google’s other chatbots and AI projects, but instead of a general purpose robot, NotebookLM is only concerned with the documents you upload. It can assimilate text files, websites, and videos, including multiple files and source types for a single agent. It has a hefty context window of 500,000 tokens and supports document uploads as large as 200MB. Google says this creates a queryable “AI expert” that can answer detailed questions and brainstorm ideas based on the source data. Read full article Comments From Ars Technica - All content via this RSS feed
Komunitas
rss.ponder.cat
Apple reported its latest quarterly earnings on Wednesday under the backdrop of a court ruling that’s poised to upend the company’s App Store business and tariff uncertainty that could spur price increases for devices including the iPhone. At least on this occasion, Apple’s revenue numbers weren’t top of mind for tech industry onlookers like they ordinarily would be. Still, overall revenue in fiscal Q2 2025 was $95.4 billion — a 5 percent jump compared to the year-ago quater — and services reached another all-time high. The iPhone, Mac, and iPad businesses all performed well thanks to new products; the iPad was particularly strong, with revenue up 15 percent year over year. In recent months, Apple has released hardware including new MacBook Airs, a more powerful Mac Studio, and the refreshed iPad Air tablet. And the iPhone 16E, designed to compete with lower-cost smartphones, debuted in February. But Apple’s software team has been going through a rough patch following a series of stumbles and embarrassments. The company’s attempts to build out its own artificial intelligence capabilities that rival OpenAI’s ChatGPT, Google Gemini, and other leaders in the category have been slow going. In early March, long-promised improvements to the company’s Siri assistant were delayed. Apple is rumored to be integrating Google’s Gemini to its Apple Intelligence software suite this fall to help keep pace. Meanwhile, the effect of President Trump’s tariffs are already reverberating across many industries, but Cook downplayed any major impacts in an interview with CNBC. He pointed to Apple’s well-distributed supply chain and manufacturing operation as a potential buffer. “If you look at the US, over half of the US sales of iPhone come from India,” he said. “If you look at the other products, Mac and iPad and AirPods and the Watch, almost all of the country of origin is Vietnam.” Much has been made about the possibility of Apple’s next iPhone lineup getting a price hike. “With an iPhone, you really have to go a step lower and look at the individual parts and where they come from,” he said. Apple is also navigating significant legal battles. In a Wednesday ruling, Judge Yvonne Gonzalez Rogers excoriated Apple executives including CEO Tim Cook for deliberately trying to limit and mollify a 2021 ruling intended to loosen the iPhone maker’s grip over the App Store. Apple has appealed Rogers’ order, but if it holds, companies including Epic, Spotify, and Patreon are planning to seize the opportunity to more freely sidestep Apple’s in-app payments and steer users to the web. From The Verge via this RSS feed
Komunitas
midwest.social
Hugo isn’t a server, per se. It’s basically just a template engine. It was originally focused on turning markdown into web pages, with some extra functionality around generating indexes and cross-references that are really what set it apart from just a simple rendering engine. And by now, much of its value is in the huge number of site templates built for Hugo. But what Hugo does is takes some metadata, whatever markdown content you have, and it generates a static web site. You still need a web server pointed at the generated content. You run Hugo on demand to regenerate the site whenever there’s new content (although, there is a “watch” mode, where it’ll watch for changes and regenerate the site in response). It’s a little fancier than that; it doesn’t regenerate content that hasn’t changed. You can have it create whatever output format you want - mine generates both HTML and gmi (Gemini) sites from the same markdown. But that’s it: at its core, it’s a static site template rendering engine. It is absolutely suitable for creating a portfolio site. Many of the templates are indeed such. And it’s not hard to make your own templates, if you know the front-end technologies.